New Causal Message Logging Protocol with Asynchronous Checkpointing for Distributed Systems
نویسنده
چکیده
Causal message logging is an efficient approach for tolerating failures of processes in distributed systems because it has the advantages of both pessimistic and optimistic message logging approach. However, traditional causal message logging protocols prevent live processes from executing continuously their computation and require some synchronous logging to the stable storage during recovery. Although Elnozahy’s protocol solves the problems, it has the central recovery leader’s problem. Additionally, if it were integrated with asynchronous checkpointing, it may result in inconsistency problems in case of concurrent failures. In this paper, we present a new causal message logging protocol with asynchronous checkpointing to need to maintain only the latest checkpoint of each process and allow live processes to execute continuously their computation even in concurrent failures during recovery. Moreover, the protocol solves the problems of Elnozahy’s protocol and improves asynchrony during recovery because the protocol enables each recovering process to be responsible for only its recovery.
منابع مشابه
On Tolerating Failures of Mobile Hosts and Mobile Support Stations
In this paper, we present two fault-tolerant protocols for mobile computing systems; a causal message logging protocol and a receiver-based pessimistic message logging protocol for tolerating failures of mobile hosts (MHs) and mobile support stations (MSSs) respectively. The systems raise several constraints such as limited life of battery power, mobility and disconnection of hosts and lack of ...
متن کاملAsynchronous Checkpointing for PVM Requires Message-Logging
Distributed computing using networked workstations o ers cost-e cient parallel computing, but the higher rate of failure requires e ective fault-tolerance. Asynchronous consistent checkpointing o ers a low-overhead solution. Parallel Virtual Machine (PVM) allows a heterogeneous network of UNIX workstations to serve immmediately as a distributed computer by providing message-passing services imp...
متن کاملOutput Driven Distributed Optimistic Message Logging and Checkpointing
Although optimistic fault tolerance methods using message logging and checkpointing have the potential to provide highly e cient transparent fault tolerance in distributed systems existing methods are limited by several factors Coordinating the asynchronous message logging progress among all processes of the system may cause signi cant over head limiting their ability to scale to large systems ...
متن کاملEfficient Diskless Checkpointing and Log Based Recovery Schemes
Checkpointing and message logging are the popular and generalpurpose tools for providing fault tolerance in distributed systems. Diskless checkpointing schemes enable frequent checkpointing without a performance penalty. The present work extends James S Plank‟s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ mechanism to checkpoint programs with high locality of reference. T...
متن کاملEnhanced Two-level Fault Recovery Scheme Combined with Message Logging
⎯ Checkpointing schemes facilitate fault recovery in distributed systems. The two-level fault recovery scheme of distributed system inherits the merits of both disk-based and diskless checkpointing schemes. The present work extends James S Plank’s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ to checkpoint programs with high locality of reference. This mechanism enables ap...
متن کامل